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Title: System & Methodology for identifying statistically significant events in 
monitored process 

This invention relates to a method, apparatus, and computer program for identifying significant events in 
a monitored process. 

Background to the invention 

Ever since the concept of a process, or indeed any measurable activity, has existed there has been the 
desire to measure it and improve upon it. In the modern world we encounter processes everywhere from 
ordering a sandwich to the building of a car. Any of these processes can be graphically monitored by 
looking at some measure plotted against the relevant processes dimension, which is typically time, but 
could be any other dimension such as length. 

In the 1950's a statistician called Deming took this idea to the manufacturing industry and showed that by 
applying a statistical rule to the data being displayed it was possible to show on the chart those points 
which were part of the normal variability of a process and those that were outside. These charts utilise 
three extra lines superimposed on top of the data, an average line and top and bottom process , 
guidelines, the position of these lines on the chart being derived from the data rather than some arbitrary 
position. Significant events, by which we mean something out of the ordinary, are those points outside • 
the process guidelines and by investigating and acting upon these; improvements in the performance of 
the process can be achieved. The charts are well known in the manufacturing industry as Statistical 
Process Control (SPC) charts and have been widely used in manufacturing since the early 1950's to 
great effect. 

The problem faced by those seeking to implement SPC charts for a process within a business is that 
there is simply no enterprise wide, simple to configure, general purpose SPC tool that is relevant to every 
one in an organisation. To date, SPC charts and the software that displays them have remained in the 
domain of the statisticians and engineers looking after complex manufacturing processes. 



Summary of the Invention 

Embodiments of the present invention include a system method and computer program for obtaining 
measure data from one or more external systems, processing & analysing the data, storing the data, and 
then displaying the data in the form of a dashboard of dials which link to various charts, including SPC - 
charts. Each person in the organisation has the option to view charts that are uniquely relevant to them. 
Furthermore, preferred embodiments introduce new statistical functions to the SPC charts so that 
business processes can be accurately modelled. Preferably the system uses standard web browser 
technology to view the charts and scales from a single user installation to the entire organisation. 

One embodiment of the invention provides a method of storing organisational information separate from 
the measure data in such a way that an organisational hierarchy (OH) can be modified, including the 
addition of hierarchy levels, without change to the Database Schema or measure data, thereby 
producing a more maintainable database. 

The OH is, in general, a set of hierarchies, joined into one hierarchy by grouping nodes, analogous to 
folders or directories in an Operating System. Each such hierarchy can be considered a 'Dimension' from 
■an OLAP perspective.'Dimensions can-also"occuf anywherein a"hierarchy; to"allow common structure at" 
the top of hierarchies to be factored out. In the following example, dimension nodes are shown in bold, 
and the top-level grouping node in bold italic: 

Organisation 
Geographical 
North 

Engineers 

Eng01 - 
Eng02 



Equipment 

Iteml . . 

Item2 ' 

South 

Engineers 

Eng03 
Eng04 

Equipment 

Item3 
Item4 

Functional 

Sales 
Support 

Here we see Engineers and Equipment dimensions nested inside a Geographical dimension to 
save repeating the North/South breakdown which is common to both. 

Typically, Data is related (or 'belongs') only to the leaves of the hierarchies, though due to pre- 
aggregation or coarse granularity of data, this may not always be the case. Data is considered 
'Appropriate' to a node in the hierarchy if it belongs to that node or any node below. Data 
Appropriate to a node is displayed in charts for that node. This method of aggregation allows the 
hierarchy to be restructured without any Data implications. For example, another level, Country, 
might be added in the above hierarchy below Geographical, with North and South now coming 
under each Country. Only the Database Table representing the Hierarchy itself would need to be 
modified, and then only its contents - no new columns or Dimension Tables would be required. 
A row of Data belongs to a node in this hierarchy, for a given Measure, if one of the ownership 
expressions in the definition of that Measure evaluates to the 'Owner ID' of that node. This is the 
only link between the OH and the Data. 

This mechanism allows a row of Data to belong to different nodes in the OH for different 
Measures; the relationship between the Data and the OH is a parameter of each Measure 
Definition. This contrasts with the more common, and less flexible, Foreign Key link typical of 
OLAP systems. 

Therefore, measure information is stored separate from the measure data so that the measure 
hierarchy can be modified without change to the measure data. 

A method is also provided of jumping via a URL link from stored underlying measure data to a 
third party application. 

Each Data Table used to hold the data from which Measure Values are calculated, can have one 
or more 'URL Templates' associated with it. Such a template is a mapping from values in a row 
of data in that table to a URL. Each URL Template is assigned to a particular column in the Data 
Table (typically one of the columns from which the URL is created). 

If such a template exists for the Data Table used by a Measure, a corresponding option will 
become available on the menu from which a 'Data Form' is requested for that Measure. This 
Data Form will then show the values in the assigned column underlined and in blue, as is - 
normally the case for Browser Hyperlinks. Clicking on a value in this column will open a new 
Browser window, and load, it from the URL calculated by applying the URL Template to values 
from this row of data. 

The mapping from Data Table values to URL is effected by means of 'placeholders'. A 
placeholder is simply the name of a Data Table column enclosed in braces. The following is an 
example URL Template: 



http://www.myCompany.com/myApp?ref={cRefNo}&empldKcEmpld} 
(cRefNo and cEmpId are columns in the Data'Table) 

This Template would typically be assigned to the column cRefNo; if the cRefNo value has value 
'R/1234' and the cEmpid in that row has value 'E567', clicking on the cRefNo Hyperlink will open 
a new Browser window loaded from URL: 

httD://www.mvCompanv.com/myApp?ref=R%2f1234&empld=E567 
(note that the V gets escaped as '%2f as required in URLs.) 
This allows integration with existing Web-based applications. 

An extension to URL Templates allows integration to Applications running on the client machine. 
If the URL Template begins with the string 'exec 1 then the rest of the Template, once the 
placeholders have been replaced with the corresponding data vales, is used as a local command 
string. In this case no escaping of special characters occurs. For example, the following would 
open notepad with the filename specified in the cFilename column of a Data Table, in folder 
C:\Temp (this would only work on a Windows client): 

exec notepad ,, C:\Temp\{cFilename} M 

This allows integration with locally installed and thick-client applications. 

A method is provided for selecting data from multiple points in both an organisational hierarchy 
and a metric hierarchy, to allow disjunction in the case of multiple points selected within the 
same dimension, and multidimensional analysis in the case of points selected in different 
dimensions. 

A method is provided for displaying a Correlation chart, which shows two or more 
measures displayed on the same chart together with guidelines that allow the degree of 
correlation to be assessed visually. 

The correlation coefficient is an established statistic used to measure the degree of association 
between two data series. Thus given one series of data values: 

observed at times 
f 1( t 2 , f 3 ,..., t n 

and a second series of data values 

observed at the same times, the correlation coefficient may be calculated as 
where 

Xf - My 

X t} = Sl- ^ /=1,2,3,...,n 




The usual method that is employed to allow a visual assessment to be made of the degree of 
association between two data series is to construct a graph in which the data points have co- 
ordinates: 

(x^y^for/ =1,2,3,..., n. 

However, instead of this, the Correlation Chart shows both data series plotted on one graph with 
common axes, one data series displayed as the co-ordinates 

U,X„) f / = 1, 2, 3,...,n t 

the other data series displayed as the co-ordinates 
[t h Y tl )j=\2,3,...,n, 
so that time is the abscissa. 

The benefits of this alternative approach are that: 

(a) It preserves the order iri time of the data, which is an essential property of performance data. 

(b) The standard deviation a x is a measure of the dispersion of the data X (1I X, 2I X, 3 x (/)1 

likewise the standard deviation <y y i,s a measure of the dispersion of the data 

y t , y t2 , y, 3 Dividing by these standard deviations normalises the two data series so that 

they have similar dispersions and can be plotted on the same Correlation Chart with a common 
ordinate axis. 

(c) The original two data series 

X iy X t 2 > X < 3 '"-> x t n 

and 

y tv yt 2 >yt 3 >->yt n 

will have a high degree of association, and hence be correlated, if a graph of the co-ordinates 
and a graph of the co-ordinates 



(f„yj,/«t 2, 3 n, 



display similar patterns, i.e. similar upward or downward trends or changes with time. 
The transformations 

X t- " fix 

X t} = -J!—Z1 i = \2,3,...,n 

a X 

and 

Y„ = / = 1,2,3 n 

change only the location and scale of the graphs in the ordinate direction, so that graphs of 

(f,.,x,.),/ = 1,2,3 n 

and 

(f,,X,.),/= 1,2,3,...," 

will display identical patterns, and graphs of 

(f ; ,yj,/ = 1,2,3 n 

i 

and 

U,yJ,/ = i, 2, 3,...,/? 

will also display identical patterns. Hence if the original two data series display a high degree of 
association, so will the two transformed series, and visa versa. 

The Correlation Chart contains two performance guidelines lines that are calculated from the 
data and used to assess the statistical significance of any correlation between the data series 
displayed on the chart. The usual approach that is used to assess the statistical significance of 
correlation is test the null hypothesis that there is no correlation between two data series against 
the alternative hypothesis that the two data series are correlated. However, on a Correlation 
Chart, if two data series are highly correlated they will give co-ordinates that are close together 
and inside the region bounded by the performance guidelines. If data points fail outside the 
region bounded by the performance guidelines then that is taken to be a signal that the degree of 
association between the data series is weak, i.e. that they have low degree of correlation, in 
contrast to the usual approach then the performance guidelines in a Correlation Chart are thus a 
test of the null hypothesis that there is some specified, high, degree of correlation between the 
two data series against the alternative hypothesis that the degree of correlation is low. 

, The distance in the ordinate direction between the upper and lower performance guidelines in a 

.-GorFelation-Ghart-may-be-calcu^ 

bivariate Normal probability distribution that will be exceeded with a small probability when the 
correlation coefficient pof the distribution is a specified, high, value. Numerical values of the 
range W are thus required in order that the Correlation Chart may be constructed. It may be 
demonstrated by a non-trivial mathematical argument that the required range may be calculated 
from the corresponding range for a sample of size two from a standard bivariate Normal 
probability distribution when the correlation coefficient of the distribution is zero using the 
formula: 



Values of W*are available in published statistical tables. 

The above description of the Correlation Chart is presented in terms of just two data series. 
However, the arguments and formulae may be applied chart without alteration when the 
Correlation Chart is used to display three or more data series. 

A method is provided for displaying a Regression Chart, which shows a measure plotted with the 
regression model calculated for that measure, so that goodness of fit can be visually determined. 

The statistical method known as regression consists of fitting a model of the form: 

Y predicted = fio + + + A*3.f/ + »■ + K*PM ' 

where 

X \tj ' X 2,tj > *3,f/ X p,tj (' ~ "l 2, 3, /?) 

are p series of data values (p^), known as "control variables" or "predictor variables" or 
"drivers", and recorded at times 

t f (/=1 l 2,3,... l n) l 
and 

y predicted.* i (' = X 2 , 3, .... il) 

are the values predicted by the model at the same series of times, and the parameters in the 
model 

are calculated from the observed values of the control variables and the observed values of the 
response variable 

V observed M (/ = L 2. 3 A?), 

so as to minimise the sum of squares of the residuals 
where the residuals are calculated as 

r tj - y observed ,tj ~ V 'predicted t tf (' = 1i 2, 3 fl) , 

All the above constitutes established theory. The usual method that is employed to allow a visual 
assessment to be made of how well the model predicts the response variable Is to construct a 
graph in which the data points have co-ordinates: 

(*y,f/.r//)M>2,3 n) 

for one or more of the control variables (i.e. for one or more values off), or to construct a graph 
in which the data points have co-ordinates: 



(y observed, tj ■ ) (' - 1, 2, 3, /?). 



However, instead of this the Regression Chart shows the observed and predicted values of the 
response variable plotted on one graph with common axes, displayed as the co-ordinates: 

{^Y observed tU ) (/ = 1, 2, 3, .... /l) 

and 

{*;> y predicted, n) ('' = 12,3 n) 

so that time is the abscissa. 



Performance guidelines are calculated and displayed on the graph as lines 3x 

above and below the line representing the predicted values of the response variable. 

When new data become available, for a time period after that used to calculate the parameters in 
the model, the model is used to calculate the predicted values of the response variable, and the 
observed and predicted values continue to be plotted on the graph together with the 
performance guidelines. 

The benefits of this alternative approach are that: 

(a) It preserves the order in time of the data, which is an essential property of performance data. 

(b) The performance guidelines can be used to identify either individual times (i.e. positions on 
the time axis), or periods of time, where the observed values of the response variable differ 
significantly from the values predicted by the model. 

(c) The Regression Chart will show if the model remains valid for the time period after that used 
to calculate the parameters of the model - a breakdown in validity will be indicated by the 
occurrence of individual observed values of the response variable falling outside the region 
bounded by the performance guidelines, or by a run of observed values of the response variable 
all on the same side of the line representing the predicted values of the response variable. 

A method is provided for modelling a process showing both a trend and cyclic variation with two 
or more seasons of data. 

The method will be described for the case when the time partition is a month. The presence of 
cyclic variation then means that if a month gives a low result in one year then there is a tendency 
for that month to give low results every year, or if a month gives a high result in one year then 
there is a tendency for that month to give high results every year. The presence of a trend then 
means that there is a tendency for the data to increase steadily over the period of two or more 
years, or that there is a tendency for the data to decrease steadily over the period of two or more 
years. 

When there are data for an exact number of years, the formulae needed to calculate the model 
and performance guidelines for the process are comparatively simple. However, if the results 
obtained are then applied to data for a period containing an incomplete year, the performance 
guidelines are unsatisfactory, particularly when there are only data for only two or three years. 

To understand the reason for this, it is necessary to consider the correlations between individual 
data points and the model. Suppose that data for exactly two years, running from January one 
year to December the next year, are used to calculate the model. This means that the model for 
January is calculated as the average of the data for the two Januaries, adjusted for the trend. A 
data point for one of these Januaries will thus be highly correlated with the model for January. 



V/j-p-1 



On the other hand, the data point for January in the third year will be uncorrected with the 
model because it has not been used to calculate the model. In order that the same significance 
may be attached to any point that falls outside the region bounded by the performance 
- qufdelihes; the distance between the line on an' sCharf representing the model and the lines 
representing the performance guidelines should take into account the degree of correlation 
between the data point and the model. If the degree of correlation varies from one data point to 
another, this distance should also vary. 

However it would be undesirable for the typical user of the sfn software to see the distance 
between the model and the performance guidelines varying, as they would have difficulty 
understanding the reason for this. The way to get around this difficulty is to adopt the principle 
that all the data are always used to calculate the model. This means that a data point will always 
be correlated with the corresponding model value, and it can be shown, by some non-trivial 
mathematics, that the same significance can then be attached to any point that falls outside the 
region bounded by the performance guidelines-, to an acceptable level of approximation. 

The formulae that are then needed to calculate the parameters of the model can be derived by 
some non-trivial mathematics and are as follows. 

The method is based on fitting the model: 

y kJ = a .j + bxi + e kJ with i = q*(k-i)+j 

to the data points 

y*j 

where 

7=1, 2, 3 q 

represents the months in a year ( q = 12 ), and 

/c=1,2,3,...,p 

represents the years ( p >. 2 ). 

The formulae apply when there are complete data for the first p-1 years, but data for only the first 
q' months for the last year, year p. and when p x q £ 24 . 

The parameters in the mode! are calculated as the values that cause the sum of squares to be 
minimised: 

s - ii(y k j - a j - bxi f 

y=i/f=i 

In this summation, for the last year (when k=p) j = 1,...,<7' with q'<q. 
For j < q' we calculate a y from: 

y.j = a j + bxi J 



where 



For j > q' we calculate a ; from: 
y'j = a j + b *ij 



1) + j 



Thus if we know the trend parameter b we can calculate the parameters a y - representing the 
cyclic variation. The trend parameter b is calculated as: 

A method, that allows server client components of an embodiment of the invention to run 
together on a single machine without the need for an application server. 

Interaction between the client and server components of the software occurs in the Submitter 
class on the client, via a single Java class which implements the interface SfnFactoryl. In the 
case of the Enterprise Version, this is the class TunnellingSfnFactory, which communicates with 
a Servlet, running on an Application Server (SfnFactoryServlet), over HTTP. The Servlet 
contains an instance of the class SfnFactory (which also implements the interface SfnFactoryl) 
which carries out all of the server-side functionality: 



where 

y\j = -Zy kJ 



x(/c' - 



/; = q 

1 p-i 

p - U=1 



In the standalone version, everything runs on the client in a single Applet. In this case 
implementation of the SfnFactoryl interface used by the Submitter class is the SfnFactory class, 
obviating the need for an Application Server run ning ,a_Servlet: 





The Submitter class tries to load an instance of the SfnFactory class, to give standalone 
behaviour, but if this fails it loads an instance of TunnellingSfnFactory. This (required failure in 
the Enterprise version is achieved by not supplying the SfnFactory class in the JAR file from 
which the Applet is loaded. 

A method that allows a user or session to override the default measure definitions by having a 
system of' Layered Measure Profiles. A Measure Profile can be assigned to a User, and a 
'Current' Measure Profile can be set for the session; these provide layers above the base 
Measure Definitions. Parameters that can be overridden include Display Options, Aggregation 
method, Data Filtering, and even the name of the Data Table used for the Measure (to allow pre- 
aggregated data to be used in some cases, for example). 



Alternative Implementations 

There are other software tools in the market place such as Minitab that display SPC charts from data 
input to them. These are tools for the statistician and differ from our invention in that these tools do not 
cater for an organisation (the organisation hierarchy), provide statistical functions applicable for 
modelling a business. 
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A method for taking process related data (measure data) from external system(s), processing 
that data/and storing it in a form that allows any individual to select and view an SPC chart that 
is of interest or uniquely relevant to them, comprising: 

a. A database server, 

' implementing a schema that allows different measures to be defined and each and every 
measure value to be associated with a date/time stamp, and a user in an organisational 
hierarchy. 

b. An application server running the server componentof the invention that: 

i. Presents to the user a mechanism for selecting where in an organisation 
hierarchy they wish measure values to be viewed from and what measure they 
would like to see. 

ii. Provides a choice of different charts for the given selection including a 
dashboard of Dials that show the current value for each measure , a Benchmark 
chart that allows comparison of a selected measure across an organisational 
hierarchy, a Pareto chart that allows drill down through the measure levels and 
an sChart which is a form of SPC chart. 

iii. Delivers the requested chart on demand to the user's Browser. 

c. A web Browser connected via a computer network to the application server that 
loads the client component of the invention which interprets and displays the chart 
information sent from the server. 

A method according to claim 1, that allows any user to select charts that are appropriate and 
permissible for them to view, comprising: 

a. An organisational hierarchy presented in a tree format from which the user selects the 
point in the organisation from which charts are to be displayed subject to the user's 
permission level - See Fig1 . 

b. A measure hierarchy presented in a tree format from which the user selects the 
measure or measure attribute they wish to view - See Fig1 . 

c. A menu of functions applicable to the selected points in the hierarchies comprising: 

i. A function to display a Dashboard of Dials for all measures for the selected point 
in the organisational hierarchy. 

ii. A function to display an sChart 

iii. A function to display a Benchmark chart 

iv. A function to display a Pareto chart 

v. A function to display the measure data in a tabular form. 

A method according to claim 1 , that shows the current value for each measure, comprising: 

a. A Dial with highlighted segments that represent the normal variability of the measure 
data having values derived from the Performance Guidelines of the related sChart - see 
Fig2. 

b. A pointer that represents the current value of the measure data. 

c. A menu of functions associated with the measure and organisational point comprising: 

i. A function to display an sChart 

ii. . A function to display a Benchmark chart 

iii. A function to display a Pareto chart 

iv. A function to display the measure data in a tabular form. 

A method according to claim 1 , that shows the comparison of a selected measure across an 
"organisationahhierarchyrcomprisingr - "~ 

a. A Benchmark chart having on the X axis the names of each organisation which is a 
child of the selected point in the organisation , and on the Y axis, a scale suitable for 
displaying any measure value found in. the measure in the organisations listed on the X 
axis. 

b. For each organisation listed on the X axis, a vertical bar whose height and position on 
the chart represents the normal variability of the measure data having values derived 
from the Performance Guidelines on the related sChart - see Fig2. 

c. On each vertical bar, marks indicating the average and the current value of the measure 
data for the particular organisation the vertical bar represents. 



d. A menu of functions associated with any particular vertical bar which the pointing device 
of the user's computer is currently over, comprising: 

i. A function to display an sChart 

ii. A function to display a Benchmark chart 

iii. A function to display a Pareto chart 

iv. A function to display the measure data in a tabular form. 

A method according to claim 1, that allows drill down through the measure data, comprising: 

a. A Pareto chart having on the X-axis the names of each attribute that is a child of the 
selected measure or attribute, and on the Y axis, a scale starting at zero and suitable for 
displaying any value found in the entries listed on the X axis. 

b. For each entry on the X-axis, a vertical bar representing the sum of all measure values 
for that'entry 

c. A menu of functions associated with any particular vertical bar which the pointing device 
of the users computer is currently over comprising: 

i. A function to display a Pareto chart 

ii. A function to display an sChart 

iii. A function to display a Benchmark chart 

vi. A function to display the measure data in a tabular form. 

A method according to claim 1, that allows an SPC chart to be viewed for a particular point in the 
organisational hierarchy and for a particular measure or measure attribute, comprising: 

a. An sChart having on the X-axis the dimension of the measure and on the Y-axis,. a scale 
suitable for displaying any measure value found in the measure 

b. For each measure value, a point drawn on the sChart whose X-axis value corresponds 
to the value's Date/Time, and whose Y-axis value corresponds to the measure value. 
Each point is connected via a line to the previous point and subsequent point bar the first 
and last points. 

c. A horizontal average line calculated from the measure data. 

d. Two horizontal Process Guidelines calculated from the average line, one set such that 
all points above the guideline are of statistical significance and one set such that any 
points below the guideline are of statistical significance. Statistical significance is 
calculated as any value outside of 2.66 times the average moving range from the 
average line. 

e. A menu of functions associated with the sChart comprising: 

i. A function to display a Pareto chart 

ii. A function to display a Benchmark chart. 

f. Context sensitive information attached to the point device tip (e.g. mouse cursor) 
comprising: 

i. When over a point, display of measure value 

ii. When under X-axis, display of measure value and details of any statistical 
process that has been applied to the measure data. 

A method of alerting a user that there is a signai in the recent data of the sChart related to a Dial. 
The background of the dial changes to a different colour according to the following signals: 

a. A run of points above the average 

b. A run of points below the average 

c. A single point above the upper Process Guideline 

d. A single point below the lower Process Guideline. 

A method of applying statistical functions to an SPC chart while it is displayed which allow 
behaviour in the real world to be modelled, comprising: 

a. A function to insert a Process Break at a particular measure value. Calculation of the 
average and Process Guidelines restart from the selected measure value. 

b. A function to model a cyclic process. 

c. A function to model a trended process 

d. A function to model a trended cyclic process 

e. A function to exclude outlying data points ' 

f. A function to limit the data range used in calculations 

g. A function to annotate individual measure values. 



9. A method of storing organisational information separate from the measure data so that the 
organisational hierarchy can be modified without change to the measure data. 

10. ' "Amethotf of storing measure information separate from the measure data so that the measure 

hierarchy can be modified without change to the measure data. 

11. A method of jumping via a URL "link from the underlying measure data to a third party application. 

12. A method of selecting data from multiple points in both the organisational hierarchy and the 
metric hierarchy, to allow disjunction in the case of multiple points selected within the same 
dimension, and multidimensional analysis in the case of points selected in different dimensions. 

13. A method of displaying a Correlation chart, which shows two or more measures displayed on the 
same chart together with guidelines that allow the degree of correlation to be assessed visually. 

14. A method of displaying a Regression chart, which shows a measure plotted with the regression 
model calculated for that measure , so that goodness of fit can be visually determined. 

1 5. A method of modelling a cyclic process with only two seasons of data. 

16. A method according to claim 1, that allows the server and client components of the invention to 
run together on a single machine without the need for an application server. 

17. A method for allowing a user/session to override the default measure definitions. 
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